22 research outputs found

    Canonical, Stable, General Mapping using Context Schemes

    Full text link
    Motivation: Sequence mapping is the cornerstone of modern genomics. However, most existing sequence mapping algorithms are insufficiently general. Results: We introduce context schemes: a method that allows the unambiguous recognition of a reference base in a query sequence by testing the query for substrings from an algorithmically defined set. Context schemes only map when there is a unique best mapping, and define this criterion uniformly for all reference bases. Mappings under context schemes can also be made stable, so that extension of the query string (e.g. by increasing read length) will not alter the mapping of previously mapped positions. Context schemes are general in several senses. They natively support the detection of arbitrary complex, novel rearrangements relative to the reference. They can scale over orders of magnitude in query sequence length. Finally, they are trivially extensible to more complex reference structures, such as graphs, that incorporate additional variation. We demonstrate empirically the existence of high performance context schemes, and present efficient context scheme mapping algorithms. Availability and Implementation: The software test framework created for this work is available from https://registry.hub.docker.com/u/adamnovak/sequence-graphs/. Contact: [email protected] Supplementary Information: Six supplementary figures and one supplementary section are available with the online version of this article.Comment: Submission for Bioinformatic

    An Average-Case Sublinear Exact Li and Stephens Forward Algorithm

    Get PDF
    Hidden Markov models of haplotype inheritance such as the Li and Stephens model allow for computationally tractable probability calculations using the forward algorithms as long as the representative reference panel used in the model is sufficiently small. Specifically, the monoploid Li and Stephens model and its variants are linear in reference panel size unless heuristic approximations are used. However, sequencing projects numbering in the thousands to hundreds of thousands of individuals are underway, and others numbering in the millions are anticipated. To make the Li and Stephens forward algorithm for these datasets computationally tractable, we have created a numerically exact version of the algorithm with observed average case O(nk^{0.35}) runtime in number of genetic sites n and reference panel size k. This avoids any tradeoff between runtime and model complexity. We demonstrate that our approach also provides a succinct data structure for general purpose haplotype data storage. We discuss generalizations of our algorithmic techniques to other hidden Markov models

    An average-case sublinear forward algorithm for the haploid Li and Stephens model

    No full text
    Abstract Background Hidden Markov models of haplotype inheritance such as the Li and Stephens model allow for computationally tractable probability calculations using the forward algorithm as long as the representative reference panel used in the model is sufficiently small. Specifically, the monoploid Li and Stephens model and its variants are linear in reference panel size unless heuristic approximations are used. However, sequencing projects numbering in the thousands to hundreds of thousands of individuals are underway, and others numbering in the millions are anticipated. Results To make the forward algorithm for the haploid Li and Stephens model computationally tractable for these datasets, we have created a numerically exact version of the algorithm with observed average case sublinear runtime with respect to reference panel size k when tested against the 1000 Genomes dataset. Conclusions We show a forward algorithm which avoids any tradeoff between runtime and model complexity. Our algorithm makes use of two general strategies which might be applicable to improving the time complexity of other future sequence analysis algorithms: sparse dynamic programming matrices and lazy evaluation

    Picroscope: low-cost system for simultaneous longitudinal biological imaging.

    No full text
    Simultaneous longitudinal imaging across multiple conditions and replicates has been crucial for scientific studies aiming to understand biological processes and disease. Yet, imaging systems capable of accomplishing these tasks are economically unattainable for most academic and teaching laboratories around the world. Here, we propose the Picroscope, which is the first low-cost system for simultaneous longitudinal biological imaging made primarily using off-the-shelf and 3D-printed materials. The Picroscope is compatible with standard 24-well cell culture plates and captures 3D z-stack image data. The Picroscope can be controlled remotely, allowing for automatic imaging with minimal intervention from the investigator. Here, we use this system in a range of applications. We gathered longitudinal whole organism image data for frogs, zebrafish, and planaria worms. We also gathered image data inside an incubator to observe 2D monolayers and 3D mammalian tissue culture models. Using this tool, we can measure the behavior of entire organisms or individual cells over long-time periods
    corecore